30 research outputs found
Coherent multi-dimensional segmentation of multiview images using a variational framework and applications to image based rendering
Image Based Rendering (IBR) and in particular light field rendering has attracted a lot of
attention for interpolating new viewpoints from a set of multiview images. New images of
a scene are interpolated directly from nearby available ones, thus enabling a photorealistic
rendering. Sampling theory for light fields has shown that exact geometric information
in the scene is often unnecessary for rendering new views. Indeed, the band of the function
is approximately limited and new views can be rendered using classical interpolation
methods. However, IBR using undersampled light fields suffers from aliasing effects and
is difficult particularly when the scene has large depth variations and occlusions. In order
to deal with these cases, we study two approaches:
New sampling schemes have recently emerged that are able to perfectly reconstruct
certain classes of parametric signals that are not bandlimited but characterized by a finite
number of parameters. In this context, we derive novel sampling schemes for piecewise
sinusoidal and polynomial signals. In particular, we show that a piecewise sinusoidal signal
with arbitrarily high frequencies can be exactly recovered given certain conditions. These
results are applied to parametric multiview data that are not bandlimited.
We also focus on the problem of extracting regions (or layers) in multiview images
that can be individually rendered free of aliasing. The problem is posed in a multidimensional
variational framework using region competition. In extension to previous
methods, layers are considered as multi-dimensional hypervolumes. Therefore the segmentation
is done jointly over all the images and coherence is imposed throughout the
data. However, instead of propagating active hypersurfaces, we derive a semi-parametric
methodology that takes into account the constraints imposed by the camera setup and the
occlusion ordering. The resulting framework is a global multi-dimensional region competition that is consistent in all the images and efficiently handles occlusions. We show the
validity of the approach with captured light fields. Other special effects such as augmented
reality and disocclusion of hidden objects are also demonstrated
When does Privileged Information Explain Away Label Noise?
Leveraging privileged information (PI), or features available during training
but not at test time, has recently been shown to be an effective method for
addressing label noise. However, the reasons for its effectiveness are not well
understood. In this study, we investigate the role played by different
properties of the PI in explaining away label noise. Through experiments on
multiple datasets with real PI (CIFAR-N/H) and a new large-scale benchmark
ImageNet-PI, we find that PI is most helpful when it allows networks to easily
distinguish clean from noisy data, while enabling a learning shortcut to
memorize the noisy examples. Interestingly, when PI becomes too predictive of
the target label, PI methods often perform worse than their no-PI baselines.
Based on these findings, we propose several enhancements to the
state-of-the-art PI methods and demonstrate the potential of PI as a means of
tackling label noise. Finally, we show how we can easily combine the resulting
PI approaches with existing no-PI techniques designed to deal with label noise.Comment: Accepted ICML 2023, Honolul
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
We introduce Three Towers (3T), a flexible method to improve the contrastive
learning of vision-language models by incorporating pretrained image
classifiers. While contrastive models are usually trained from scratch, LiT
(Zhai et al., 2022) has recently shown performance gains from using pretrained
classifier embeddings. However, LiT directly replaces the image tower with the
frozen embeddings, excluding any potential benefits of contrastively training
the image tower. With 3T, we propose a more flexible strategy that allows the
image tower to benefit from both pretrained embeddings and contrastive
training. To achieve this, we introduce a third tower that contains the frozen
pretrained embeddings, and we encourage alignment between this third tower and
the main image-text towers. Empirically, 3T consistently improves over LiT and
the CLIP-style from-scratch baseline for retrieval tasks. For classification,
3T reliably improves over the from-scratch baseline, and while it underperforms
relative to LiT for JFT-pretrained models, it outperforms LiT for ImageNet-21k
and Places365 pretraining